Estimation and control in finite Markov decision processes with the average reward criterion

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bounded Parameter Markov Decision Processes with Average Reward Criterion

Bounded parameter Markov Decision Processes (BMDPs) address the issue of dealing with uncertainty in the parameters of a Markov Decision Process (MDP). Unlike the case of an MDP, the notion of an optimal policy for a BMDP is not entirely straightforward. We consider two notions of optimality based on optimistic and pessimistic criteria. These have been analyzed for discounted BMDPs. Here we pro...

متن کامل

Average-Reward Decentralized Markov Decision Processes

Formal analysis of decentralized decision making has become a thriving research area in recent years, producing a number of multi-agent extensions of Markov decision processes. While much of the work has focused on optimizing discounted cumulative reward, optimizing average reward is sometimes a more suitable criterion. We formalize a class of such problems and analyze its characteristics, show...

متن کامل

Fuzzy Decision Processes with an Average Reward Criterion

As the same framework of Fuzzy decision processes with the discounted case we will specify an average fuzzy criterion model and develop its optimization by “fuzzy max order” under appropriate conditions. The average reward is characterized, by introducing a relative value function, as a unique solution of the associated equation. Also we derive the optimality equation using the “vanishing disco...

متن کامل

OPTIMAL CONTROL OF AVERAGE REWARD MARKOV DECISION PROCESSES ' CONSTRAINED CONTINUOUS - TIME FINITE Eugene

The paper studies optimization of average-reward continuous-time finite state and action Markov Decision Processes with multiple criteria and constraints. Under the standard unichain assumption, we prove the existence of optimal K-switching strategies for feasible problems with K constraints. For switching randomized strategies, the decisions depend on the current state and the the time spent i...

متن کامل

Pseudometrics for State Aggregation in Average Reward Markov Decision Processes

We consider how state similarity in average reward Markov decision processes (MDPs) may be described by pseudometrics. Introducing the notion of adequate pseudometrics which are well adapted to the structure of the MDP, we show how these may be used for state aggregation. Upper bounds on the loss that may be caused by working on the aggregated instead of the original MDP are given and compared ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applicationes Mathematicae

سال: 2004

ISSN: 1233-7234,1730-6280

DOI: 10.4064/am31-2-1